Search results for "learner corpus"
showing 4 items of 4 documents
Comparing formulaicity of learner writing through phrase-frames: a corpus-driven study of Lithuanian and Polish EFL student writing
2018
Learner corpus research continues to provide evidence of how formulaic language is (mis)used by learners of English as a foreign language (EFL). This paper deals with less investigated multi-word units in EFL contexts, namely, phrase-frames (Fletcher 2002–2007), i.e. sets of n-grams identical except for one word (it is * to, in the * of). The study compares Lithuanian and Polish learner writing in English in terms of phrase-frames and contrasts them with native speakers. The analysis shows that certain differences between Lithuanian and Polish learners result from transfer from their native languages, yet both groups of learners share many common features. Most importantly, the phrase-frame…
Analysing Lexical Density and Lexical Diversity in University Students’ Written Discourse
2015
Abstract This study analyses both lexical density and lexical diversity in the written production of two groups of first year students at the Universitat de Valencia at the beginning and end of one-semester teaching period. These results were compared with those obtained by a third group of students aiming at level C2. Lexical density was tested using Textalyser ( http://textalyser.net ) and lexical frequency used the software RANGE (Nation and Heatly, 1994). Our results prove that the students from both groups at level B1 show the same progression between writing tasks 1 and 3. Furthermore, we can claim that it is possible to obtain a reliable measure of lexical richness which is stable ac…
Establishing a Standardised Procedure for Building Learner Corpora
2014
Decisions at the outset of preparing a learner corpus are of crucial importance for how the corpus can be built and how it can be analysed later on. This paper presents a generic workflow to build learner corpora while taking into account the needs of the users. The workflow results from an extensive collaboration between linguists that annotate and use the corpus and computer linguists that are responsible for providing technical support. The paper addresses the linguists’ research needs as well as the availability and usability of language technology tools necessary to meet them. We demonstrate and illustrate the relevance of the workflow using results and examples from our L1 learner cor…
Using Automatic Morphological Tools to Process Data from a Learner Corpus of Hungarian
2014
The aim of this article is to show how automatic morphological tools originally used to analyze native speaker data can be applied to process data from a learner corpus of Hungarian. We collected written data from 35 students majoring in Hungarian studies at the University of Zagreb, Croatia. The data were analyzed by magyarlanc, a sentence splitter, morphological analyzer, POS-tagger and dependency parser, which found 667 unknown word forms. We investigated the recommendations made by the Hungarian spellchecker hunspell for these unknown words and the correct forms were manually chosen. It was found that if the first suggestion made by hunspell was automatically accepted, an accuracy score…